perm filename KRD2.TEX[PEG,DBL]3 blob sn#500073 filedate 1980-02-13 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00017 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	\init{
C00004 00003	\chapbegin{2}		% Here beginneth Chapter 2
C00005 00004	\sectionbegin[1]{Introduction}
C00007 00005	\sectionbegin[2]{Computer consultants}
C00010 00006	\subsectionbegin[2.1]{Mycin}
C00013 00007	\topinsert{\vskip 43pc \vfill		% Figure 2-1
C00017 00008	\topinsert{\vskip 43pc \vfill		% Figure 2-2
C00021 00009	\sectionbegin[3]{Teiresias: System organization}
C00027 00010	\subsectionbegin[3.1]{Performance Program: Knowledge Base Organization}
C00038 00011	\subsectionbegin[3.2]{Performance Program: The Inference Engine}
C00049 00012	\subsectionbegin[3.3]{Domain Independence and Range of Application}
C00055 00013	\sectionbegin[4]{Production rules}
C00062 00014	\subsectionbegin[4.2]{Production Rules as a Knowledge Representation}
C00068 00015	\subsectionbegin[4.3]{Impact on Knowledge Organization}
C00073 00016	\subsectionbegin[4.4]{Production Rules as a High-Level Language}
C00083 00017	\sectionbegin[5]{Levels of knowledge}
C00094 ENDMK
C⊗;
\init{
\input cmpdes
\input temacs
\def\draft{T}
\titlepage
\newpage
\setcount0 11
\parttitle{TEIRESIAS: APPLICATIONS OF META-LEVEL KNOWLEDGE}
}
\baselineskip 12pt
\chapbegin{2}		% Here beginneth Chapter 2
\chaptertitle{BACKGROUND}
\rjustline{\:B Background}
\tenpoint
\vskip 13.5pc

\epigraph{It vexes me what ails him.}
\author{Sophocles}
\source{Oedipus the King, line 74}
\epigraphend
\sectionbegin[1]{Introduction}
The first part of this chapter provides a brief overview of {\TEIRESIAS} and the
sort of performance programs it is designed to help build.  We describe the
knowledge representations used, review the control structure, and
introduce several concepts and terms that will be useful vocabulary in
subsequent chapters.

     Section 2--4 explores some general ideas about production
rules, showing how they have been used to develop what amounts to a high-level
language and indicating how this language forms the basis for many of
the capabilities discussed later.

     Section 2--5 considers briefly the problem of high performance {\vs}
generality and examines the work we have done in that light, considering in
particular the benefits of a hierarchical layering of types of knowledge.

\sectionskip
\sectionbegin[2]{Computer consultants}
\index{computer consultants}The recent growth of interest in the
class of programs known as
computer consultants can be seen as a logical consequence of the two trends
noted in chapter 1---an emphasis on large stores of domain-specific
knowledge and the concentration on problems taken from real world settings.
These programs are intended to provide expert-level advice on a difficult
{\index{expertise}}cognitive problem, perhaps one for which human expertise is in short
supply.\ffnote{The concept is defined here in terms that assume the existence of
human experts to insure that knowledge in the domain is sufficiently
advanced that it can reliably support high performance.  Human experts also
supply one of the best sources for this knowledge (learning from textbooks
has also been considered) and offer a standard by which to gauge program
performance.  The definition includes only cognitive tasks, to minimize
complicating factors that result from the reliance of human performance on
highly specialized processors (as appears to be true for vision and
manipulation).}

     Several programs of this type are currently under development.  A
program for diagnosis of glaucoma is described in [Kulikowski73]; internal
medicine is the domain for another effort described by [Pople75]; and work
on a consultation program for electro-mechanical repair tasks is reported
in [Hart75].  These programs all rely on large stores of domain-specific
knowledge for their performance, and thus could be considered candidates
for the kind of performance program {\TEIRESIAS} has been designed to help
construct.

\subsectionbegin[2.1]{Mycin}
One such program, the {\sc MYCIN} system ([Shortliffe76],
[Davis77b]), provided the context in which {\TEIRESIAS} was developed and played the
role of the performance program in Figure 1--3.  {\sc MYCIN} was
designed to provide consultative advice on diagnosis and therapy for
infectious diseases.  Such advice is often required in the hospital because
the attending physician may not be an expert on infectious disease---as, for
example, when a cardiology patient develops an infection after heart
surgery. Time considerations compound the problem. A specimen (blood,
urine, etc.) from a patient can show some early evidence of bacterial
growth within 12 hours, but 24 to 48 hours (or more) are usually required
for positive identification.  The physician must therefore often decide in
the absence of complete information whether or not to start treatment and
what drugs to use if treatment is required.  Both of these may be difficult
questions.

     Figures 2--1 and 2--2 show the
initial and final parts of a sample interaction between a physician and the
program (italicized comments at the right provide additional commentary but
are not part of the actual dialog).  {\sc MYCIN} in effect ``interviews''
the doctor about his patient, collecting information that will allow it to
infer the diagnosis and select an appropriate therapy.

\topinsert{\vskip 43pc \vfill		% Figure 2-1
\figtitle 2-1{The initial part of a consultation.}}

% .FULLPAGEFIG;TURN ON "→";
% --------PATIENT-1--------
% 1) Patient's name: (first-last)
% **%3FRED SMITH %*                 	→%2α{Background data.α}%*
% 2) Sex:
% **%3MALE %*
% 3) Age:
% **%355 %*
% 4) Have you been able to obtain positive cultures from a site 
%    at which Fred Smith has an infection?
% **%3YES%*
%   --------INFECTION-1--------
%   5) What is the infection?
%   **%3PRIMARY-BACTEREMIA %*
%   6) Please give date and time when signs or symptoms of the
%      primary-bacteremia (INFECTION-1) first appeared.
%   **%3 MAY 5, 1977  18:00%*
%   The most recent positive culture associated with the primary-
%   bacteremia (INFECTION-1) will be referred to as:
%     --------CULTURE-1--------
%     7) From what site was the specimen for CULTURE-1 taken?
%     **%3BLOOD %*
%     8) Please give the date and time when this blood culture 
%        (CULTURE-1) was obtained. (mo/da/yr)
%     **%3 MAY 9, 1975 %*
%     The first significant organism from this blood culture 
%     (CULTURE-1) will be referred to as:
%       --------ORGANISM-1--------
%       9) Enter the identity  of ORGANISM-1:
%       **%3UNKNOWN %*
% →%2α{Typically the identity is not known and must be inferred.α}%*
%       10) Is ORGANISM-1 a rod or coccus (etc.):
%       **%3ROD %*
%       11) The gram stain of ORGANISM-1:
%       **%3GRAMNEG %*
%       12) Have there been POSITIVE cultures from sites which
% 	  are NOT associated with infections about which you
% 	  are seeking advice?
%       **%3NO%*
%       13) Did ORGANISM-1 grow aerobically?
%       **%3YES (.8)%*
%               ...
% .STARTCOM; INDENT 10,10,10;
% {More questions follow, to gather sufficient information to infer
% possible identities of the infecting organisms.}
% .ENDCOM;
% .FIG(The initial part of a consultation,CONSULTSTART:);
% .ENDFIG;
\topinsert{\vskip 43pc \vfill		% Figure 2-2
\figtitle 2-2{The final part of a consultation.}}

% .FULLPAGEFIG;TURN ON ''→'';
% My therapy recommendation will be based on the following 
% probable infection(s) and potential causative organism(s):
% 
% INFECTION-1 is PRIMARY-BACTEREMIA	→%2α{Possible identities.}%*
% <1>  The identity of ORGANISM-1 may be PSEUDOMONAS-AERUGINOSA 
% <2>  The identity of ORGANISM-1 may be KLEBSIELLA-PNEUMONIAE
% <3>  The identity of ORGANISM-1 may be E.COLI
% <4>  The identity of ORGANISM-1 may be BACTEROIDES-FRAGILIS
% <5>  The identity of ORGANISM-1 may be ENTEROBACTER
% <6>  The identity of ORGANISM-1 may be PROTEUS-NON-MIRABILIS
% 
% 46) On a scale of 0 to 4 (higher numbers indicate increasing
%     severity), how would you rate Fred Smith's degree of 
%     sickness?
% **%33%*
% 47) Does Fred Smith have a clinically significant allergic 
%     reaction to any antimicrobial agent?
% **%3NO %*
% 48) Fred Smith's weight in kilograms (or <number> POUNDS):
% **%370%*
% 		→%2α{Note that 2 drugs cover for 6 potential organisms.α}%*
% [Rec 1] My preferred therapy recommendation is as follows:
%    In order to cover for Items <1 2 3 5 6>:
%         Give: GENTAMICIN
%         Dose: 119 mg (1.7 mg/kg) q8h IV [or IM] for 10 days
%         Comments: Modify dose in renal failure
%    In order to cover for Item <4>:
%         Give: CLINDAMYCIN
%         Dose: 595 mg (8.5 mg/kg) q6h IV [or IM] for 14 days
%         Comments: If diarrhea or other GI symptoms develop, 
% 		  patient should be evaluated for possible 
% 		  pseudomembranous colitis.
% 
% Do you wish to see the next choice therapy?
% **%3 YES%* 	→%2α{Alternative choices are available.α}%*
% Which drug do you wish eliminated from consideration?
% **%3CLINDAMYCIN%*
% [Rec 2] Next best therapy recommendation:
%    In order to cover for Items <2 3 4 5 6>:
%         Give: CHLORAMPHENICOL
%         Dose: 1 gm (15 mg/kg) q6h IV for 14 days
%         Comments: Monitor patient's white count
%    In order to cover for Item <1>:
%         Give: GENTAMICIN
%         Dose: 119 mg (1.7 mg/kg) q8h IV [or IM] for 10 days
%         Comments: Modify dose in renal failure
% 
% .FIG(The final part of a consultation,CONSULTEND:);
% .ENDFIG;

\sectionskip
\sectionbegin[3]{Teiresias: System organization}
As we have noted, {\TEIRESIAS} provides a number of tools designed to aid in the
construction, maintenance, and use of the knowledge base in the performance
program.  The overall structure of the system is shown in 
Figure 2--3, a more detailed version of Figure 1--3.  In
order to focus on the issues of knowledge acquisition, explanation,
etc., covered in later chapters, we will henceforth adopt the perspective
suggested in Figure 2--3
and consider the performance program as the simple entity
indicated there.    {\sc MYCIN} is of course more complex, but this abstraction
contains all the detail necessary for our purposes.\index{strategies}

% .STARTFIG; BOXFIG; TURN OFF "%";
% 
% 	 ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃
% 	 }         TEIRESIAS                                 }
% 	 }                            ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃  }
% 	 }                            }PERFORMANCE PROGRAM}  }
% 	 }  ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃           }   ⊂∂∂∂∂∂∂∂∂∂∂∂⊃   }  }
%   E  ← ∂ ∂ ∂} EXPLANATION }← ∂ ∂ ∂ ∂ ∂}∂ ∂} INFERENCE }   }  }
% 	 }  }             }           }   }  ENGINE   }   }  }
%   X      }  %∂∂∂∂∂∂∂∂∂∂∂∂∂$           }   %∂∂∂∂∂∂∂∂∂∂∂$   }  }
% 	 }                            }   ⊂∂∂∂∂∂∂∂∂∂∂∂⊃   }  }
%   P      }  ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃           }   } KNOWLEDGE }   }  }
%      ∂ ∂ ∂ →} KNOWLEDGE   }∂ ∂ ∂ ∂ ∂ ∂}∂ →}   BASE    }   }  }
%   E      }  } ACQUISITION }           }   %∂∂∂∂∂∂∂∂∂∂∂$   }  }
% 	 }  %∂∂∂∂∂∂∂∂∂∂∂∂∂$           }         ↑         }  }
%   R      }                            %∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$  }
%      ∂ ⊃ }                                      }            }
%   T      }              ⊂∂∂∂∂∂∂∂∂∂∂∂∂⊃                       }
%        % ∂ ∂ ∂ ∂ ∂ ∂ ∂ →} STRATEGIES }∂ ∂ ∂ ∂ ∂ $            }
% 	 }              %∂∂∂∂∂∂∂∂∂∂∂∂$                       }
% 	 %∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$
% 
% .FIG(Teiresias:  System organization, DETAILEDOVERLAY:);
% .ENDFIG;

\vskip 23pc
\figtitle 2-3{{\TEIRESIAS}:  System organization.}

\index{explanation}{\TEIRESIAS}'s {\sl explanation} facility
(see chapter 3) uses both the information in the
{\index{knowledge base}\index{inference engine}}knowledge base and an understanding
of the design of the inference engine to provide
the expert with explanations of the performance program's behavior.  The\index{knowledge acquisition}
{\sl knowledge acquisition} facility (chapters 4-6) allows the expert to transfer
his knowledge of the field into the knowledge base, in order to increase the
performance program's competence.  Finally, the base of {\sl strategy knowledge}
(chapter 7) provides a mechanism for expressing strategies concerning the use of
information in the knowledge base.\ffnote{Ideally, it should be possible for the expert
to add strategy knowledge to the system using the knowledge acquisition
facility, but this has not as yet been implemented.  See chapter 7 for details.}

     {\TEIRESIAS} is written in {\sc INTERLISP}, an advanced dialect of the 
{\sc LISP} language, and runs on a DEC PDP-10 under Tenex.  The knowledge acquisition
program occupies approximately 40,000 (36-bit) words; the explanation
program, 10,000; and the strategy knowledge base, 5,000.

\subsectionbegin[3.1]{Performance Program: Knowledge Base Organization}

\comment{.beginind knowledge base}
\epigraph{In time you will know all with certainty.}
\author{Sophocles}
\source{Oedipus the King, line 613}
\epigraphend

\noindent
{\TEIRESIAS} is designed to deal with knowledge encoded in the form of inference
{\index{inference rules}}rules of the sort shown in Figure 2--4.
The rules are stored internally in the
{\sc INTERLISP} form shown, from which
the English version is generated.  Each
rule is a single ``chunk'' of domain-specific information indicating an
{\sl action} (in this case a conclusion) that is justified if the conditions
specified in the {\sl premise} are fulfilled.

% .STARTFIG;
% RULE050
% -------
% .SKIP 1;
% If  1) the infection is primary-bacteremia, and
%     2) the site of the culture is one of the sterile sites, and
%     3) the suspected portal of entry of the organism is the 
%        gastrointestinal tract,
% then there is suggestive evidence (.7) that the identity of the 
%      organism is bacteroides.
% 
% 
% PREMISE	   ($AND (SAME CNTXT INFECT PRIMARY-BACTEREMIA)
% 	         (MEMBF CNTXT SITE STERILESITES)
% 	         (SAME CNTXT PORTAL GI))
% ACTION	   (CONCLUDE CNTXT IDENT BACTEROIDES TALLY .7)
% .FIG(An inference rule,RULE:);
% .ENDFIG;CONTINUE;

\vskip 14pc
\figtitle 2-4{An inference rule.}

\noindent
Note that the rules are judgmental, that is, they make inexact inferences.  In the case of
the rule in Figure 2--4, for instance, the evidence cited in the premise is
enough to assert the conclusion shown with a mild degree of confidence: 
 .7 out of 1.0.  This  number is called a ``certainty factor,'' or CF, and
embodies a model of
confirmation described in [Shortliffe75b].  The details of this model need not
concern us here; we need only note that rules in this case
are typically inexact inferences,
rather than statements made with certainty.\ffnote{$\$$AND (the multivalued
analogue of the Boolean AND)
performs a minimization operation; $\$$OR (similar) does a maximization.  Note
that, unlike standard probability theory, $\$$AND does not involve any
multiplication over its arguments.  Since CFs are not probabilities, there is no
{\it a priori} reason why a product should be a reasonable number. There is,
moreover, a long-standing convention in work with multivalued logics which
interprets AND as $min$ and OR as $max$ [Lukasciewicz70]. It is based
primarily on intuitive grounds: If a conclusion requires all of its antecedents
to be true, then it is a relatively safe and conservative strategy to use the
smallest of the antecedent values as the value of the premise.  Similarly, if
any one of the antecedent clauses justifies the conclusion, it is safe to take
the maximum value.}

     The premise of each rule is a Boolean combination
of one or more {\sl clauses}, each of which is constructed from a
{\sl predicate\index{associative triple}
function} with an {\sl associative triple} ({\sl attribute, object, value}) as
its argument.  Thus each clause of a premise has the following four
components:
$$\ctrline{\rm\<predicate function>\qquad\<object>\qquad\<attribute>\qquad\<value>}$$
For the third clause in the premise of Figure 2--4, for example, the predicate
function is {\tt SAME}, and the triple is ``{\sl portal-of-entry} of {\sl organism}
is {\sl GI-tract}.'' ({\tt CNTXT} is a free variable which is bound to the specific 
object [also called a ``context''] for which the rule is invoked.)
There is a standardized set of some 24 domain-independent predicate functions
(\eg, {\tt SAME, KNOWN, DEFINITE}) and a range of domain-specific attributes
(\eg, {\tt IDENTITY, SITE}), objects (\eg, {\tt ORGANISM, CULTURE}), and
associated values (\eg, {\tt E.COLI, BLOOD}).  These form a ``vocabulary'' of
conceptual primitives available for use in constructing rules.

	A rule premise is always a conjunction of clauses, but it may contain
arbitrarily complex conjunctions or disjunctions nested within each clause.
(Instead of writing rules whose premise would be a disjunction of clauses, a
separate rule is written for each clause.)  The action part indicates one or
more conclusions that can be drawn if the premises are satisfied, making the
rules purely inferential.

	Each rule is intended to embody a single, independent chunk of
knowledge and  states all necessary information explicitly in the premise.  Since
the rule uses a vocabulary of concepts common to the domain, it forms,
by itself, a comprehensible statement of some piece of domain knowledge.  As will
become clear, this characteristic is useful in many ways. 

	Each rule is highly stylized, with the if/then format and the specified
set of available primitives.  While the {\sc LISP} form of each is executable
code (the premise, in fact, is simply EVALuated by {\sc LISP} to test its
truth; and the action EVALuated to make its conclusions), this tightly
structured form makes it possible to examine the rules as well as execute them.
This, in turn, leads to some important capabilities, described below.  As
one  example, the internal form (\ie, {\sc LISP})  can be
translated into the English form shown in Figure 2--4.

	Facts about the world (Figure 2--5) are represented as 4-tuples made
up of an 
associative triple and its current CF.  Positive CFs indicate a predominance of
evidence confirming a hypothesis; negative CFs indicate predominance of
disconfirming evidence.

% .STARTFIG;
% 		(IDENT ORGANISM-2 KLEBSIELLA .25)
% 		(IDENT ORGANISM-2 E.COLI .83)
% 		(SENSITIVS ORGANISM-1 PENICILLIN -1.0)
% .FIG (Examples of representation of facts, FTUPLES:);
% .ENDFIG;
% .continue

\topinsert{\vskip 8pc \figtitle 2-5{Examples of representation of facts.}}

\noindent
Note that the model of inexact logic permits the coexistence of several
plausible values for a single attribute, if suggested by the evidence.
Thus, for example, after attempting to deduce the identity ({\tt IDENT}) of an
organism, {\sc MYCIN} may have concluded (correctly) that there is
evidence both for E.coli and for Klebsiella.

	There are thus two major forms of knowledge representation in use
in the performance program:  ({\it i\/}) the attributes, objects, and
values---which form a vocabulary of domain-specific conceptual primitives,
and ({\it ii\/}) the inference rules expressed in terms of these primitives.
There are, correspondingly, two forms of knowledge acquisition: ({\it i\/})
the acquisition of new primitives---to expand the performance program's
vocabulary of concepts, and ({\it ii\/}) the acquisition of new rules expressed
in terms of existing primitives.  {\TEIRESIAS} makes possible both of these; chapter
6 deals with the first, and chapter 5 describes the second.

\comment{.endind knowledge base}
\subsectionbegin[3.2]{Performance Program: The Inference Engine}

\epigraph{Know that I have gone many ways wandering in thought.}
\author{Sophocles}
\source{Oedipus the King, lines 66--67}
\epigraphend

\index{backward chaining}
\comment{.beginind inference engine}
\index{inference rules}
\index{goal tree}

\noindent
The rules are invoked in a simple backward-chaining fashion that
produces an exhaustive depth-first search of an and/or goal tree.  Assume
that the program is attempting to determine the identity of an infecting
organism.  It retrieves all the rules that make a conclusion about that
topic (\ie, they mention {\tt IDENT} in their action) and invokes each one
in turn, evaluating each premise to see if the conditions specified have
been met.  For the rule in Figure 2--4, this process would begin with
determining the type of the infection.  This, in turn, is set up as a
subgoal and the process recurs.

	The search is thus depth-first (because each premise condition is
thoroughly explored in turn); the tree that is sprouted is an and/or goal
tree (because rules may have OR conditions in their premise); and the
search is exhaustive (because the rules are inexact; so that even if one
succeeds, it was deemed to be a wisely conservative strategy to continue to
collect all evidence about the subgoal.)\index{subgoal}

	Note that the subgoal that is set up is a generalized form of the
original goal. Thus, for the first clause in Figure 2--4 (``the infection
is primary-bacteremia''), the subgoal set up is ``determine the type of
infection.'' The subgoal is therefore always of the form ``determine the
value of \<attribute>'' rather than ``determine whether the \<attribute> is
equal to \<value>.''  By setting up the generalized goal of collecting all
evidence about an attribute, the performance program effectively exhausts
each subject as it is encountered, and thus tends to group together all
questions about a given topic.  This results in a system that displays a
much more focused, methodical approach to the task, which is a distinct
advantage where human engineering considerations are important.  The cost
is the effort of deducing or collecting information that is not strictly
necessary.  However, since this occurs rarely---only when the \<attribute>
can be deduced with certainty to be the \<value> named in the original
goal---it has not proven to be a problem in practice.

	If after trying all relevant rules (referred to as ``tracing'' the
subgoal), the total weight of the
evidence about a hypothesis falls between $-.2$ and $.2$ (an empirical
threshold), the answer is regarded as still unknown.  This may happen if no
rules are applicable, if the applicable rules are too weak, if the
effects of several rules offset each other, or if there are no rules for
this subgoal at all.  In any of these cases, when the system is unable to
deduce the answer, it asks the user for the value of the subgoal (using a
phrase that is stored along with the attribute itself).

	The strategy of always attempting to deduce the value of a subgoal
and asking the user only when deduction fails, insures a minimum number of
questions.  It would also mean, however, that work might be expended
searching for a subgoal, arriving perhaps at a less than definite answer,
when the user might already know the answer with certainty.  To prevent
this inefficiency, some of the attributes have been labeled ``laboratory
data,'' to indicate that they represent information available to the
physician as results of quantitative tests.  In these cases the
deduce-then-ask procedure is reversed and the system will attempt to
deduce the answer only if the user cannot supply it.  Given the desire to\index{tree search}
minimize both tree search and the number of questions asked, there is no
guaranteed optimal solution to the problem of deciding when to ask for
information and when to try to deduce it.  But the distinction described
has performed quite well and seems to embody a very appropriate criterion.

	Two other additions to straightforward tree search increase the
inference engine's efficiency.  First, before the entire list of rules for
a subgoal is retrieved, the program attempts to find a sequence of rules
that would establish the goal with certainty, based only on what is
currently known.  Since this is a search for a sequence of rules with CF=1,\index{unity path}
the result is termed a {\sl unity path}.  Besides efficiency considerations,
this process offers the advantage of allowing the program to make ``common
sense'' deductions with a minimum of effort (rules with CF$=$1 are largely
definitional).  Because there are few such rules in the system, the search
is typically very brief.\index{partial evaluation}

	Second, the inference engine performs a partial evaluation of rule
premises.  Since many attributes are found in several rules, the value of
one clause (perhaps the last) in a premise may already have been
established while the rest are still unknown.  If this clause alone would
make the premise false, there is clearly no reason to do all the search
necessary to establish the others. Each premise is thus ``previewed'' by
evaluating it on the basis of currently available information. This
produces a Boolean combination of TRUEs, FALSEs, and UNKNOWNs;
straightforward simplification (\eg, $F ∧ U ≡ F$) indicates whether the
rule is guaranteed to fail.  This procedure is examined in more detail in
Section 2--4.4.

	The final aspect of the control structure is the tree of objects
(or contexts) that is constructed dynamically from a fixed hierarchy as the
consultation proceeds (Figure 2--6).  This tree serves several
purposes.  First, bindings of free variables in a rule are established by
the context in which the rule is invoked, with the standard access to
contexts that are its ancestors.  Second, since this tree is used to
represent the relationships of objects in the domain, it helps structure
the consultation in ways already familiar to the user.  For example, in the
medical domain, a patient has one or more infections, each of which may
have one or more associated cultures, each of which in turn may have one or
more organisms growing in it, and so on.

\topinsert{\vskip 42.7pc \vfill		% Figure 2-6
	\figtitle 2-6{A sample tree of objects}}

% .STARTFIG;
% .boxfig;
% 			    PATIENT-1
% 
% 				}
% 				}
% 				}
% 
% 			   INFECTION-1
% 
% 			      /  \
% 			     /    \ 
% 			    /      \
% 			   /        \
% 
% 	 PREVIOUS-CULTURE-1          CULTURE-1
% 
% 	      }                      /    \
% 	      }                     /      \
% 	      }                    /        \
% 	      }                   /          \
% 
%      PREVIOUS-ORGANISM-1     ORGANISM-1     ORGANISM-2
% 
% .FIG(A sample tree of objects,OBJECTTREE:);
% .ENDFIG;
\comment{.endind inference engine}

\subsectionbegin[3.3]{Domain Independence and Range of Application}
\index{domain independence}The fundamental design and implementation
of the system in 
Figure 2--3 does not restrict its use to medical domains.
This is due primarily to the modularization suggested in that figure and to
the fact that {\TEIRESIAS} is oriented solely around the concepts of rules,
attribute-object-value triples, etc.

	The clear distinction between the inference engine and the
knowledge base, for instance, makes it possible to remove one knowledge
base from the performance program and replace it with another.  It has
proven possible, for instance, to build separate knowledge bases for such
disparate areas as auto mechanics and chemotherapy for psychiatry.  In
the first such effort ([vanMelle74]), a small part of an auto repair
manual was rewritten as production rules and inserted in place of the
bacteremia knowledge base.  What resulted was a very simple but fully
functional consultant capable of diagnosing and suggesting remedies for
problems in parts of an auto electrical system.  More recently, a pilot
system for psychiatric diagnosis and chemotherapy was assembled.  While
this program had only 50 rules, it too was fully functional and displayed
primitive but encouraging performance.  In both systems, all of the
established explanation facilities worked as designed, without
modification.

	There are, naturally, some domains that might be less profitable to
explore.  One of the interesting lessons of the auto repair system was that
domains with little inexactness in the reasoning process---those for which
algorithmic diagnostic routines can be written---are not particularly
appropriate for this methodology.  The precision in these domains means
that little use is made of the certainty factor mechanism, and many of the
more complicated (and computationally expensive) features go unused.  The
effect would be akin to swatting a fly with a large doctoral thesis---all
that work and weight are unnecessary when something far simpler would do.\index{inference rules}

	Nor is it reasonable to expect to be able to write inference rules
built from attribute-object-value triples for an arbitrary domain.  As
knowledge in an area accumulates, it becomes progressively more formalized.
There is a certain stage in this formalization process when it is
appropriate to use rules of the sort shown above.  Earlier than this
the knowledge is too unstructured; later on it may (like the auto repair
system) be more effective to write straightforward algorithms.

	It is also possible that the knowledge in some domains is
inherently unsuited to a rule-like representation, since rules become
increasingly awkward as the number of premise clauses increases.  Dealing
with a number of interacting factors may be difficult for any
representation, but given the reliance here on rules as a medium of
communication of knowledge, the problem becomes especially significant.

	Finally, the current performance program makes extensive use of the\index{associative triple}
attribute-object-value associative triple.  This is perhaps the most
limiting factor, since this kind of representation is effective only in
simpler domains.  It is also, however, a far more implementation-dependent
factor than the other two mentioned.  While it would be difficult, the
triples could be replaced by a more powerful representation scheme without
adversely affecting the feasibility of the techniques for knowledge
acquisition and use that are described in subsequent chapters.

\sectionskip
\sectionbegin[4]{Production rules}
While many of the ideas on which {\TEIRESIAS} is based are applicable to a
range\index{knowledge representation}
of knowledge representations, its design and implementation have been strongly
influenced by the use of a rule-based encoding of knowledge.  This section
explores production rules from several perspectives, to provide a basis for some
of the issues discussed in later chapters.

\comment{.beginind production rules}
\subsectionbegin[4.1]{Production Rules in General}
	Production rules have been the subject of much work since their
introduction by Post ([Post43]) as a general computational mechanism.
Across the wide range of variations and techniques explored, there appears
to be a set of fundamental characteristics intrinsic to the methodology. (An
extended discussion of these characteristics can be found in [Davis77a].)

	The present discussion will be concerned with
two problems  typically encountered in using production rules.  They are
described briefly here, so that their manifestations will be evident in
later discussions of system performance.  They are also described in terms which
should make it clear that these problems are inherent in the production system
methodology itself. This will enable the reader to distinguish them from
shortcomings that may have arisen from our approach to various uses of meta-level
knowledge. Thus, while {\TEIRESIAS} displays certain limitations,
some of these are a result of the approach, while others are a legacy of the
particular representation of knowledge chosen early in the development of the
performance program.

	One problem is the limit on the amount of knowledge
that can be expressed conveniently in a single rule.  Actions that are
``larger'' than this limit are often achieved by the combined effects of several
rules.  For many reasons (see [Davis77a], section 5), this is difficult to
do and often produces opaque results.  We will see that this is a significant
consideration when judging  the utility of production rules as a knowledge
representation.

\index{implicit context}Second, there is what has been labeled the implicit context problem.
Perhaps the simplest example can be found in a production-rule-based system
that uses a sequential left-hand side (LHS) scan---that is, it starts out
at the beginning of the rule set and searches sequentially through it, examining
the LHS of each rule until one is found that meets the desired criteria.  Suppose
the LHS of the first three rules are:

% .STARTFIG;
% R1:			A ∧ B ∧ C  %5==@   α#α#α#%*
% R2:				D  %5==@   α#α#α#%*
% R3:			    E ∧ F  %5==@   α#α#α#%*
% .ENDFIG; continue;

\topinsert{\vskip 5pc \figtitle 2-7{}}
\noindent
R3 won't  be tested unless D is false and either A, B, or C is
false. The rule thus, effectively, has two extra, implicit clauses of the form
$(not D)$ and $((not A) ∨ (not B) ∨ (not C))$, simply because of its
location in the rule set.  Note that the location is not {\sl necessarily}
critical---the entire thrust of the rule could possibly be summed up in the
presence of just E and F.  But often the location {\sl is} an important element
and the implicit clauses are significant.\ffnote{Indeed, [Waterman70] uses this idea to
great advantage.}

It is tempting to make use of implicit context to take advantage of the resulting
conciseness---consider how  short R3 is and how long an explicit
statement of the 99th rule might be if it depended on the failure of the first
98.
	While the point has been illustrated via an ordered rule set, the same
problem can arise from other causes. {\sc MYCIN}'s rules are not
ordered, but they are classified according to the object to which they
apply and similar problems arise from this.

	The essential point here is twofold: First, {\sl any} piece of structuring
in the system carries information.  Second, this information can be implicitly
passed on to the rules in subtle ways. We will see examples of this in
discussing the organization of meta-rules in chapter 7.

\subsectionbegin[4.2]{Production Rules as a Knowledge Representation}
{\TEIRESIAS}'s first goal---putting a domain expert in direct communication
with a high performance program---requires a common language of
communication.  This is especially important for the specification of
knowledge that the expert wants to add to the system. He must be able to
express it in a form that is the same as (or transformable into) the one
used internally by the program.  Thus, the ease of establishing the link
between expert and program is strongly affected by the knowledge
representation chosen.\index{knowledge representation}

	While we cannot offer formal arguments, there are several reasons
why production rules of the sort shown in Figure 2--4 are a useful
representation.  First, the general task of deduction is one that fits
quite well into the situation/action character of production rules.  There
is therefore less transformation necessary between the knowledge as
expressed by the expert and its final encoding inside the system.

	Next, the rules are, by themselves, comprehensible chunks of
knowledge, since they carry in their premise a full specification of their
applicability.  Their independence also facilitates the incremental growth
of the knowledge base: Rules can be added one by one, and performance
improves with each addition.

	Rules are also retrieved and invoked on the basis of the content of
their right-hand side (rather than on the basis of their name) and never
reference one another.  As a result, adding (or changing) a rule is a far
easier task than would be the case if there were extensive references to
the rule's name in many places.  Consider the difficulty, by contrast, of
editing a procedure in an {\sc ALGOL}-like language and then having to
go back and find all the places it is called to see if those calls are
still appropriate.

	The rules also seem to capture a ``chunk'' of knowledge of the
appropriate size. A significant amount of knowledge of bacteremia diagnosis
and therapy has been encoded in rules that have between two and five
premise clauses and one or two actions.  While all domains may not offer
such convenient ``bite-size'' chunks, the success of the systems in the other
two domains that have been explored is encouraging.

\index{backward chaining}The control structure---backward
chaining---also appears to be
reasonably intuitive.  While such {\sl modus ponens} reasoning is not often
recognized explicitly, it is common cognitive behavior and should therefore
not be alien to the expert.

	Finally, the rules are, for the most part, what may be labeled a
``single level'' mechanism.  They are composed of elements that are
conceptual primitives and require no further decomposition to be
understood.  It is clear to the user, for instance, what is meant by ``the
site is X,'' or ``conclude that the identity is Y.''  Compare this with the
difficulty that would arise if the action of a rule were the invocation of
a deep and complex calculation.

	As a result of all of these factors, production rules of the sort
shown above have proven to be an effective and useful representation.

\subsectionbegin[4.3]{Impact on Knowledge Organization}
\index{knowledge organization}\index{procedural encoding}Production rules also
contribute their own perspective on the fundamental
organization of knowledge in a program, a perspective quite different from that
associated with procedure-oriented representations. Production rules tend to
de-emphasize the hierarchical control structure natural to procedural
languages
and substitute a single, uniform collection of knowledge ``chunks'' (the rules).
Since each rule is retrieved on the basis of its contents (as the rule in
Figure 2--4
is retrieved on the basis of its conclusion), no rule is ever called
directly, in the style of procedures.  Thus, the addition (or deletion) of a rule
does not require the modification of any other rule to provide for (or
delete) a
call to it.  The result is a program whose knowledge base is easily modified
and whose behavior is relatively stable in the face of those changes.

	This stability might be demonstrated by repeatedly
removing rules from a production-rule-based program.  Many such systems will
continue to display some sort of ``reasonable'' behavior, up to a point.  By
contrast, adding a procedure to an {\sc ALGOL}-like program requires
modification of other parts of the code to insure that it is invoked, while
removing an arbitrary procedure from such a program generally cripples it.

	Note that the issue here is more than simply the ``undefined function''
error message that would result from a missing procedure. The problem persists
even if the compiler or interpreter is altered to treat undefined functions as
no-ops. The issue is a much more fundamental one concerning organization of
knowledge: Programs written in procedure-oriented languages stress the kind of
explicit passing of control from one section of code to another that is
characterized by the calling of procedures.  This is typically done at
a selected time and in a particular context---both carefully chosen by the
programmer.  If a no-op is substituted for a missing procedure, the
environment upon
return will not be what the programmer expected, and subsequent procedure
calls will be executed in increasingly incorrect environments. Similarly,
procedures that have been added must be called from {\sl somewhere} in the
program, but the location of the call must be chosen carefully if the effect is
to be meaningful.

	Production systems, on the other hand,
emphasize the decoupling of control flow from the writing of rules.  Each rule
is designed to be a modular chunk of knowledge with its own statement of
relevance.  Thus, where the {\sc ALGOL} programmer carefully chooses the
order of procedure calls to create a selected sequence of environments, in a
production system it is the environment which chooses the next rule for
execution.  And since a rule can only be chosen if its criteria of relevance have
been met, the choice will continue to be a plausible one, and system behavior
will remain ``reasonable,'' even as rules are successively deleted.

\subsectionbegin[4.4]{Production Rules as a High-Level Language}
\index{high-level language}As noted, the inference rules are
composed of clauses made up of
predicate functions and attribute-object-value triples.  The entire collection
of these elements forms a set of conceptual primitives for any given domain.
Thus in dealing with infectious disease, for example, there are {\tt cultures} with
{\tt sites} like {\tt blood}.

If we consider pushing this back one level, it becomes plausible to consider
the concepts {\sl predicate function, attribute, object,} and {\sl value} as
conceptual primitives in the more general domain of knowledge
representation.  We
consider each of them  an indication of a whole class of
objects, with individual instances supplied by the domain of application.  This\index{data types}
suggests treating them as extended data types, which is a useful
analogy to keep in mind.  There are 13 such ``data types,'' used
as a set of conceptual primitives for expressing knowledge in rule form.
They are the data types of what is, in effect, a high-level programming
language---one whose syntax is very restricted and whose sole statement
type is a rule.  Since
we refer to them often in what follows, they are listed and described
below, for reference:

% .STARTFIG; INDENT 0,0,0
% predicate function
% attribute              (abbreviated as ATTRIB) 
% object                 (for historical reasons, also called  
% 			a ''context,'' abbreviated as CNTXT)
% value 
% certainty factor       (a real number between -1 and 1, often
% 		        abbreviated CF)
% list                   (the standard {\sc LISP} concept)
% atom                   (the standard {\sc LISP} concept)
% string                 (the standard {\sc LISP} concept)
% slotname               (discussed in chapter 6)
% slotexpert             (discussed in chapter 6)
% blank                  (discussed in chapter 6)
% advice                 (discussed in chapter 6)
% table                  (discussed in chapter 6)
% .ENDFIG;
\vskip 14pc

	This concept, of a high-level language with a restricted syntax and
a small number of data types, forms an important base underlying many of the
the capabilities of {\TEIRESIAS}.  Perhaps the most fundamental of these is the
ability to make multiple uses of a single body of knowledge.  Because {\TEIRESIAS}
can both assemble and dissect rules, we have a system that can not only use
its knowledge of a domain directly, but can also examine it, alter it,
abstract it, and draw conclusions about it.  The rules are thus, at
different times, both code and data and are used in both capacities almost
equally.  A large number of interesting and useful features follow from
this; they are explored in subsequent chapters.  To clarify this point,
however, we offer a simple but illustrative example.

As indicated earlier in our discussion of the control structure, before\index{partial evaluation}
invoking a rule the system performs a partial evaluation of its premise to make
sure that the rule is not already guaranteed to fail.  But performing this evaluation
is nontrivial.  The system requires a way to tell if any clause in the premise
is known to be false.  It cannot simply EVALuate each clause individually, since a
subgoal that had never been traced before would send the system off on its
recursive search.

	However, if the system can establish which attribute is used in
the clause, it is possible to determine  whether this attribute
has been traced previously
(by reference to internal flags).  If so, the clause can be EVALuated to obtain
the value.

	This process is made possible by a {\tt TEMPLATE} associated with each predicate
function.  It describes the format of any call of that predicate function, by
giving the generic type and order of the arguments to the function. It thus
resembles a simplified procedure declaration.

% .STARTFIG;
% Function           Template             Sample function call
% 
%  SAME	   (SAME CNTXT ATTRIB VALUE)   (SAME CNTXT SITE BLOOD)
% 
% .FIG(Example of a function template);
% .ENDFIG;CONTINUE;

\vskip 7pc
\figtitle 2-8{Example of a function template.}
\noindent
The template is not itself a piece of code but is simply a list structure
of the sort shown above, indicating the appearance of an interpreted call
to the predicate function.  Since rules are kept in interpreted form, as
shown in Figure 2--4, the template can be used as a guide to dissect a
rule.  For each clause in a rule, {\TEIRESIAS} retrieves the template associated with
the predicate function found in that clause (\ie, the template associated
with the {\tt CAR} of the clause)  and uses it to guide the examination of
the clause.  In the case of the function {\tt SAME}, for instance, the
template indicates that the attribute ({\tt ATTRIB}) is the third element of
the list structure that comprises the function call.  The previewing
mechanism uses the templates to extract the attribute from the clause in
question and can then determine whether or not it has been traced.

	There are two points of interest here:

\listbegin
\numlistentry[1]{Part of the system is examining the code (the rules) being
executed by another part.}
\numlistentry[2]{This examination is guided by the information carried in
components of the rules themselves.}
\listend

\noindent
The ability to examine the code could have been accomplished by requiring
all predicate functions to use the same format, but this is obviously
awkward.  Allowing each function to describe the format of its own calls
permits code to be stylized without being constrained to a single form, and\index{flexibility}
hence is flexible and much easier to use.  This approach requires only that
each form be expressible in a template built from the current set of
conceptual primitives.  It also insures that the capability will persist in
the face of future additions to the system. The result is one example of
the general idea of giving the system access to, and an ``understanding'' of,
its own representations.  Additional examples of this concept are spread
throughout the remainder of this work.

\comment{.endind production rules}
\sectionskip
\sectionbegin[5]{Levels of knowledge}
\index{meta-level knowledge}The concept of meta-level knowledge
introduced in chapter 1 can be defined
more generally as multiple levels of knowledge.\ffnote{There
have been other uses of the term ``levels of knowledge,'' most
notably to describe the hierarchy of domain knowledge in the {\sc HEARSAY II}
system [Lesser74] (\eg, phoneme, word, syntax, semantics, etc.).  We use the term here in a
different sense, that of a hierarchy of {\sl types} of knowledge, and intend that
meaning throughout.}  This idea has several
important applications in the work reported here.  In chapter 7,\index{strategies}
strategies are defined in terms of a knowledge hierarchy with an arbitrary
number of levels.  A different sort of hierarchy is responsible for much\index{knowledge acquisition}
of the performance and generality of {\TEIRESIAS}'s knowledge acquisition
routines.  In this hierarchy, knowledge is stratified into three distinct levels.  A brief
description of it here will help to clarify many of the ideas involved
and illustrate their power.

	The first level of the hierarchy contains the
object-level knowledge---medical knowledge of cultures,
organisms, drugs, etc.  High performance on the task of diagnosis and therapy
selection is supported by an extensive collection of knowledge about objects in
the medical domain. This first level is naturally limited to this single domain.\index{knowledge representation}

	The next level is concerned with the conceptual building blocks
of the knowledge representation---the predicate functions, attributes, values,
rules, and so on.  Performance on the task of knowledge acquisition is dependent
upon an extensive body of knowledge about these building blocks.  That is, there
is assembled at this level a large amount of knowledge about the representational
primitives.  As will become clear in chapters 5 and  6, the system has an
``understanding'' of what an attribute is, what roles it plays, etc.
Since no reference is made at this level to any specific instance of any of the
primitives, this level of knowledge has a degree of domain independence.  Over
the range of domains in which knowledge can be represented in terms of these
primitives, the knowledge acquisition routines are similarly domain
independent.

	Knowledge at the third level is concerned with the conceptual
primitives behind representations in general.  To make this clearer, consider
the recursion of ideas: To aid the construction of a high-performance
(object-level) program, we build a (meta-level) system that can acquire
object-level
knowledge.  Its performance at this task is based on an extensive store of
knowledge about specific representations.  {\sl But it in turn is ``just'' another
knowledge-based system.} By supplying the proper set of
representation-independent primitives (and a store of knowledge about them),
we can use
precisely the same formalism (indeed, the same code) to provide a system for
acquiring knowledge about individual representations.\fnote{The 
details and examples of this bootstrapping process are given in chapter 6.}

	In this way, the second-order system can be used to acquire knowledge
about a representation.  This, in turn, becomes the knowledge base for the
meta-level
system, which then facilitates the construction of the knowledge base for
the object-level performance program.  The two stages of this process are shown
below.

% .STARTFIG;BOXFIG; TURN OFF ''%'';
% 
%             teaching about        teaching about the
%            a representation      domain of application
% ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃      ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃       ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃
% }knowledge of   }      }knowledge of   }       }              }
% }representation-}      }primitives for }       } object-level }
% }independent    }= = @ }a specific     } = = @ }knowledge base}
% }primitives     }      }representation }       }              }
% %∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$      %∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$       %∂∂∂∂∂∂∂∂∂∂∂∂∂∂$
% 
%    SYSTEM 2                SYSTEM 1                SYSTEM 0
% 
% .FIG(The conceptual process);
% .ENDFIG;
% .continue

\vskip 12pc
\figtitle 2-9{The conceptual process.}
\noindent
Note, however, that while this indicates the process as it appears
conceptually, a more accurate system view is shown below.  Here we have
only two systems, because there is in fact only a single higher level
system.  This is possible because the process of teaching about a
{\sl representation} can be made computationally identical to the process of
teaching about specific {\sl instances} of that representation.  The two are
therefore accomplished with precisely the same mechanism.

% .STARTFIG;BOXFIG;
% 			 teaching about the
% 			domain of application
% 	  ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃               ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃
%   ⊂∂∂∂∂`≥ }  meta-level   }               }  object-level  }
%   } ⊂∂∂≤' }    system     }    = = @      }     system     }
%   } }     α%∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$               α%∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$
%   } }            } }
%   } }		 } }
%   } α%∂∂∂∂∂∂∂∂∂∂∂∂$ }
%   α%∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$
%     teaching about
%    a representation
% .FIG(The computational process);
% .ENDFIG;

\topinsert{\vskip 12pc \figtitle 2-{10}{The computational process.}}

	In {\TEIRESIAS}, the recursive application of the knowledge acquisition
system offers a certain degree of representation independence; its extent
and limitations are examined in chapter 6.

	We began this report by noting that the original search for\index{generality}
generality---set in the domain of problem-solving methods---has proved
unsuccessful thus far.  Knowledge-based methods have been suggested as an
alternative, but each has a sharply limited range of application, and the
lure of generality remains.  Is there another way to salvage it?

	One of the underlying themes of this work is the attempt to capture\index{knowledge acquisition}
a different form of generality, one that has its source in knowledge
acquisition rather than in problem-solving methods.  That is, if programs
require large stores of knowledge for performance, then can we not take a
step back and discover powerful, broadly applicable techniques for
accomplishing the transfer of knowledge from expert to program?  The
resulting man-machine combination would be a semi-automatic system, whose
generality arose from access to the appropriate human experts and whose
power was based on the store of knowledge it acquired from them.  The next
five chapters discuss several steps toward the realization of this aim.

\worldend